Brief Communication Rearrangements of Dna Sequences and Sbh*

نویسنده

  • A. PEVZNER
چکیده

Despite recent advances in DNA sequencing by hybridization it is still a random shotgun method. Even if one manages to routinely sequence short DNA fragments by SBH these fragments have to be assembled into the final genomic sequence. Recently different additional biochemical experiments were suggested which potentially may drastically increase the resolving power of SBH. However biologists frequently cannot estimate the computer science limitations of the proposed additional experiments and no computational studies of additional experiments for SBH were provided yet. The paper discusses a combinatorial technique which might help a biologist to analyze different additional biochemical experiments and to combine these data with SBH data to increase the resolving power of SBH. DNA sequencing by hybridization (SBH) is a challenging al.ternative to the classical DNA sequencing methods. The basic approach is to build an array (Sequencing Chip) of short oligonucleotides, to use hybridization for finding oligonucleotide (q-gram) composition of an unknown DNA fragment and to reconstruct the fragment by a combinatorial algorithm [see Pevzner & Lipshutz (1994) for a recent review]. A number of major breakthroughs in SBH were reported recently; Fodor et al. (1991) developed photolithographic technique for building chips, Southern et al. (1992) built the first sequencing chip, Drmanac er al. (1993) read the first DNA fragment by SBH. However, the original SBH chip containing all 4* = 65536 octanucleotides in insufficient for sequencing long DNA fragments. In particular, Pevzner (1989) demonstrated that even in the case of an i&al (errorless) SBH experiment one can hope to reconstruct a 200 nucleotide long sequence only in 94 out of 100 cases. To increase the resolving power of SBH chips Bains (1991), Pevzner et al. (1991) and Pevzner & Lipshutz (1994) developed different optimized chips; in particular binary chips allow one to sequence a 1800 bp fragment using a 64 kb chip. However, despite recent advances, SBH is still a random shotgun method. Even if one manages to routinely sequence 200-300 bp fragments with 64 kb SBH chips (1500-1800 bp with binary chips) these fragments have to be assembled into the final genomic sequence. Recently different additional *Some toaics included in this work have been discussed during* the Third International Workshop on Open Prob kens of Computational Molecular Biology, Telluride, Cola., 11-25 July 1993. biochemical experiments were suggested which potentially may make SBH a true genomic sequencing method (Drmanac et d., 1989; Khrapko et al., 1989; Drmanac & Crkvenjakov, 1992; Chetverin & Kramer, 1993). However biologists frequently cannot estimate the computer science limitations of the proposed additional experiments and, to our knowledge, almost no computational studies of additional experiments for SBH were provided yet. Hence biologists are unable to make even a preliminary judgement on how the proposed additional experiments increase the resolving power of SBH. To analyze different additional biochemical experiments one needs a characterization theorem providing a description of all DNA sequences with the given SBH spectrum. The paper studies a compact description of all DNA sequences with the same SBH spectrum (i.e. with the same q-gram composition where q is the length of oligonucleotides on the chip) in terms of equivalent transformations. This problem is related to a fingerprint technique in combinatorial pattern matching recently analyzed by Ukkonen (1992). Studying q-gram distance between texts Ukkonen (1992) raised a problem of word transformations preserving q-gram distance and conjectured that all words with the same q-gram composition can be transformed into each other by means of natural transformations (transpositions and rotations). The proposed characterization of sequences with the same SBH spectrum is based on a combinatorial technique (alternating cycles in colored graphs) which allows one to analyze different additional data about the sequence in the framework of the same theoretical model. This technique has already led us to the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Probing Patterns for Sequencing by Hybridization

Sequencing by Hybridization (SBH) is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. The main shortcoming of SBH is that it reliably reconstructs only sequences of length at most square root of the size of the chip. Frieze et al. [9] showed that by using gapped...

متن کامل

Sequencing by Hybridization – A Simulation Study of Performance on Genomic Sequences

Sequencing by Hybridization (SBH)[1,2] is a theoretical method for de-novo sequencing of DNA by means of reconstruction of the sequence from its hybridization pattern. Typically, arraying a complete set of k-mers is considered, although different setups, such as degenerate probe arrays, are also possible [3,5]. While this method is not competitive with current biochemical sequencing methods, va...

متن کامل

Using Restriction Enzymes to Improve Sequencing by Hybridization

The expected number of n-long DNA sequences, which are consistent with a given SBH spectrum, grows exponentially with n. In this work we show that by incorporating data from a small number of restriction enzyme digestion assays (REs), the number of consistent sequences decreases signiicantly. We describe computational techniques that enable the reconstruction of sequences consistent with hybrid...

متن کامل

Tabu search method for DNA sequencing by hybridization with isothermic libraries1

In this work, a problem of DNA sequencing by hybridization (SBH) with isothermic libraries, is considered. The classical approach to SBH uses sets of oligonucleotides of equal lengths [1, 2, 8, 10, 6, 9, 7]. In isothermic SBH approach new sets of oligonucleotides are used. The library cosists of oligonucleotides with the same melting temperature but different lengths [3, 4]. Every nucleotide ad...

متن کامل

Restricting SBH Ambiguity via Restriction Enzymes

Sequencing by hybridization (SBH) is a proposed approach to DNA sequencing. The SBH-spectrum of the target sequence is a list of all k-mers occurring at least once in the sequence. Sequencing is successful if the SBH-spectrum is a result of only that sequence and ambiguous otherwise. Unfortunately, the expected number of sequences consistent with a given spectrum increases exponentially with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993